Defining Syntax for Learner Language Annotation

نویسندگان

  • Marwa Ragheb
  • Markus Dickinson
چکیده

We discuss making syntactic annotation for learner language more precise, by clarifying the properties which the layers of annotation refer to. Building from previous proposals which split linguistic annotation into multiple layers to capture non-canonical properties of learner language, we lay out the questions which must be asked for grammatical annotation and provide some answers. Our investigation points to the layer of distributional syntax being based on properties of the target language (L2) and largely redundant with the other layers. We show, for example, that subcategorization seems to better be able to underspecify annotation for situations where no single correct solution can be found. While this paves the way for applying the annotation to larger corpus efforts, it also represents a significant step in elucidating syntax for non-canonical language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotating Errors in a Hungarian Learner Corpus

We are developing and annotating a learner corpus of Hungarian, composed of student journals from three different proficiency levels written at Indiana University. Our annotation marks learner errors that are of different linguistic categories, including phonology, morphology, and syntax, but defining the annotation for an agglutinative language presents several issues. First, we must adapt an ...

متن کامل

First steps towards an ISO standard for annotating discourse relations

This paper describes initial studies in the context of a new effort within ISO to design an international standard for the annotation of discourse with semantic relations that are important for its coherence, “discourse relations”. This effort takes the Penn Discourse Treebank (PDTB) as its starting point, and applies a methodology for defining semantic annotation languages which distinguishes ...

متن کامل

Towards a multi-layered dependency annotation of Finnish

We present a dependency annotation scheme for Finnish which aims at respecting the multilayered nature of language. We first tackle the annotation of surfacesyntactic structures (SSyntS) as inspired by the Meaning-Text framework. Exclusively syntactic criteria are used when defining the surface-syntactic relations tagset. Our annotation scheme allows for a direct mapping between surface-syntax ...

متن کامل

Inter-annotator Agreement for Dependency Annotation of Learner Language

This paper reports on a study of interannotator agreement (IAA) for a dependency annotation scheme designed for learner English. Reliably-annotated learner corpora are a necessary step for the development of POS tagging and parsing of learner language. In our study, three annotators marked several layers of annotation over different levels of learner texts, and they were able to obtain generall...

متن کامل

On Grammaticality in the Syntactic Annotation of Learner Language

We examine some non-canonical annotation categories that license missing material (ellipses and enumerations). In extending these categories to learner data, the distinctions seem to require an annotator to determine whether a sentence is grammatical or not when deciding between particular analyses. We unpack the assumptions surrounding the annotation of learner language and how these particula...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012